AITopics | rp gradient

Collaborating Authors

rp gradient

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Gradient Informed Proximal Policy Optimization

Neural Information Processing SystemsFeb-8-2026, 13:35:45 GMT

We introduce a novel policy learning method that integrates analytical gradients from differentiable environments with the Proximal Policy Optimization (PPO) algorithm.

artificial intelligence, machine learning, optimization problem, (16 more...)

Neural Information Processing Systems

Country: North America > United States > Maryland > Prince George's County > College Park (0.04)

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)

Add feedback

Gradient Informed Proximal Policy Optimization

Neural Information Processing SystemsOct-8-2025, 05:40:18 GMT

We introduce a novel policy learning method that integrates analytical gradients from differentiable environments with the Proximal Policy Optimization (PPO) algorithm.

artificial intelligence, machine learning, optimization problem, (16 more...)

Neural Information Processing Systems

Country: North America > United States > Maryland > Prince George's County > College Park (0.04)

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)

Add feedback

Gradient Informed Proximal Policy Optimization

Son, Sanghyun, Zheng, Laura Yu, Sullivan, Ryan, Qiao, Yi-Ling, Lin, Ming C.

arXiv.org Artificial IntelligenceDec-14-2023

We introduce a novel policy learning method that integrates analytical gradients from differentiable environments with the Proximal Policy Optimization (PPO) algorithm. To incorporate analytical gradients into the PPO framework, we introduce the concept of an {\alpha}-policy that stands as a locally superior policy. By adaptively modifying the {\alpha} value, we can effectively manage the influence of analytical policy gradients during learning. To this end, we suggest metrics for assessing the variance and bias of analytical gradients, reducing dependence on these gradients when high variance or bias is detected. Our proposed approach outperforms baseline algorithms in various scenarios, such as function optimization, physics simulations, and traffic control environments. Our code can be found online: https://github.com/SonSang/gippo.

analytical gradient, gradient, rp gradient, (13 more...)

arXiv.org Artificial Intelligence

2312.0871

Country:

North America > United States > Maryland > Prince George's County > College Park (0.04)
North America > United States > District of Columbia > Washington (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

PIPPS: Flexible Model-Based Policy Search Robust to the Curse of Chaos

Parmas, Paavo, Rasmussen, Carl Edward, Peters, Jan, Doya, Kenji

arXiv.org Machine LearningFeb-4-2019

Previously, the exploding gradient problem has been explained to be central in deep learning and model-based reinforcement learning, because it causes numerical issues and instability in optimization. Our experiments in model-based reinforcement learning imply that the problem is not just a numerical issue, but it may be caused by a fundamental chaos-like nature of long chains of nonlinear computations. Not only do the magnitudes of the gradients become large, the direction of the gradients becomes essentially random. We show that reparameterization gradients suffer from the problem, while likelihood ratio gradients are robust. Using our insights, we develop a model-based policy search framework, Probabilistic Inference for Particle-Based Policy Search (PIPPS), which is easily extensible, and allows for almost arbitrary models and policies, while simultaneously matching the performance of previous data-efficient learning algorithms. Finally, we invent the total propagation algorithm, which efficiently computes a union over all pathwise derivative depths during a single backwards pass, automatically giving greater weight to estimators with lower variance, sometimes improving over reparameterization gradients by $10^6$ times.

estimator, gradient, variance, (14 more...)

arXiv.org Machine Learning

1902.0124

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
Europe > Germany > Hesse > Darmstadt Region > Darmstadt (0.04)
Asia > Middle East > Jordan (0.04)
(3 more...)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Variance reduction properties of the reparameterization trick

Xu, Ming, Quiroz, Matias, Kohn, Robert, Sisson, Scott A.

arXiv.org Machine LearningOct-5-2018

The reparameterization trick is widely used in variational inference as it yields more accurate estimates of the gradient of the variational objective than alternative approaches such as the score function method. Although there is overwhelming empirical evidence in the literature showing its success, there is relatively little research exploring why the reparameterization trick is so effective. We explore this under the idealized assumptions that the variational approximation is a mean-field Gaussian density and that the log of the joint density of the model parameters and the data is a quadratic function that depends on the variational mean. From this, we show that the marginal variances of the reparameterization gradient estimator are smaller than those of the score function gradient estimator. We apply the result of our idealized analysis to real-world examples.

gradient, marginal variance, variance, (14 more...)

arXiv.org Machine Learning

1809.1033

Country: